Generalization in Neural Speech Synthesis

نویسنده

  • Gavin C. Cawley
چکیده

Previous research (e.g. Cawley [1, 2]) has demonstrated that arti cial neural networks can be trained to generate the speech sounds corresponding to a sequence of phonetic tokens, including the e ects of coarticulation required to produce natural sounding synthetic speech. The principal limiting factor in the performance of neural speech synthesizers has been found to lie in the amount of training data available. This paper presents the initial results of an investigation to determine the amount of training data required to reach optimal generalization in neural speech synthesizers, through an empirical exploration of the e ects of the number of training patterns on test set error.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data-driven approaches for automatic detection of syllable boundaries

Syllabification is an essential component of many speech and language processing systems. The development of automatic speech recognizers frequently requires working with subword units such as syllables. More importantly, syllabification is an inevitable part of speech synthesis system. In this paper we present data-driven approaches to supervised learning and automatic detection of syllable bo...

متن کامل

Picture my voice: Audio to visual speech synthesis using artificial neural networks

This paper presents an initial implementation and evaluation of a system that synthesizes visual speech directly from the acoustic waveform. An artificial neural network (ANN) was trained to map the cepstral coefficients of an individual’s natural speech to the control parameters of an animated synthetic talking head. We trained on two data sets; one was a set of 400 words spoken in isolation b...

متن کامل

Phone-based speech synthesis with neural network and articulatory control

This paper presents a novel method for synthesizing speech signal using a phone-based concatenation approach. Neural network is employed for the generalization of the phone templates during synthesis. Simpli ed articulatory space input parameters based on a modi ed vowel diagram are used to provide exible and e ective articulatory control. It also enables the design of an articulatory control m...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Long Short-Term Memory for Speaker Generalization in Supervised Speech Separation

Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998